Skip to content

Add ROCM_EXPLICIT_CTX support to internode.cu#16

Open
avinashkethineedi wants to merge 1 commit into
mainfrom
feature/internode-rocm-explicit-ctx
Open

Add ROCM_EXPLICIT_CTX support to internode.cu#16
avinashkethineedi wants to merge 1 commit into
mainfrom
feature/internode-rocm-explicit-ctx

Conversation

@avinashkethineedi

Copy link
Copy Markdown

Motivation

The dispatch and combine kernels had a fixed RDMA context selection. This adds flag-driven selection so the same kernels can be built for explicit, work-group, or context-free modes, making it easier to tune performance and debug context issues.

Technical Details

  • Replaced the 2-way (ROCM_DISABLE_CTX vs. rocshmem_ctx_array) preprocessor branches in the dispatch and combine kernels with a 3-way selection:
    • ROCM_EXPLICIT_CTX → per-SM explicit context (rocshmem_ctx_array[sm_id])
    • default (neither flag) → shared work-group context (ctx)
    • ROCM_DISABLE_CTX → context-free shmem fallback
  • Enabled the work-group ctx create/destroy (shmem_wg_ctx_create / shmem_wg_ctx_destroy) for the default path in both dispatch and combine.
  • Guarded the combine ctx destroy with USE_ROCM so it stays symmetric with its create, avoiding an undeclared ctx reference in non-ROCm builds.
  • Removed a leftover commented-out std::cout debug block and the now-unused <iostream> include.

Test Plan

  • Built with ROCM_EXPLICIT_CTX enabled and ran the internode stress test for ~150 iterations

Test Result

  • Successfully passed all 150 iterations with no failures.

Submission Checklist

- Convert ctx selection to 3-way: ROCM_EXPLICIT_CTX (rocshmem_ctx_array), default work-group ctx, ROCM_DISABLE_CTX fallback
- Enable work-group ctx create/destroy for the default path in dispatch and combine
- Guard combine ctx destroy with USE_ROCM to match its create
@avbokovoy

avbokovoy commented Jun 15, 2026

Copy link
Copy Markdown

Hi @avinashkethineedi, @ahubbe-amd

Thanks for your contribution. I noticed that there's a possibility for the user of DeepEP library to specify both ROCM_EXPLICIT_CTX and ROCM_DISABLE_CTX at the same time by either passing --disable-ctx and -DROCM_EXPLICIT_CTX flags to setup.py or -DROCM_EXPLICIT_CTX -DROCM_DISABLE_CTX. Could you please confirm if this scenario should be avoided? If so, let's do the following:

  1. Add --expicit-ctx flag with detailed hint to setup.py with runtime check that only one of --disable-ctx or -expicit-ctx flags is specified. Or none
  2. Add static_assert( defined(ROCM_EXPLICIT_CTX) && (ROCM_DISABLE_CTX) ) into internode.cu file to guard against defining both variables.

Minor: Probably ROCM_EXPLICIT_CTX naming doesn't reflect the purpose of the variable. Might be a good idea to reconsider it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants